{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6b39b02d",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9c48213d-6e6a-4c10-838a-2a7c710c3a05",
   "metadata": {},
   "source": [
    "# HuggingFace LLM - StableLM"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08efa764",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5275e801",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "690a6918-7c75-4f95-9ccc-d2c4a1fe00d7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:numexpr.utils:Note: NumExpr detected 16 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n",
      "Note: NumExpr detected 16 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n",
      "INFO:numexpr.utils:NumExpr defaulting to 8 threads.\n",
      "NumExpr defaulting to 8 threads.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/loganm/miniconda3/envs/gpt_index/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
    "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
    "\n",
    "from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext\n",
    "from llama_index.llms import HuggingFaceLLM"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17480fd9",
   "metadata": {},
   "source": [
    "#### Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ace579c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "50d3b817-b70e-4667-be4f-d3a0fe4bd119",
   "metadata": {},
   "source": [
    "#### Load documents, build the VectorStoreIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90a895ac-2e8b-4d55-97bd-ad614dceda40",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1726fff",
   "metadata": {},
   "outputs": [],
   "source": [
    "# setup prompts - specific to StableLM\n",
    "from llama_index.prompts import PromptTemplate\n",
    "\n",
    "system_prompt = \"\"\"<|SYSTEM|># StableLM Tuned (Alpha version)\n",
    "- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.\n",
    "- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.\n",
    "- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.\n",
    "- StableLM will refuse to participate in anything that could harm a human.\n",
    "\"\"\"\n",
    "\n",
    "# This will wrap the default prompts that are internal to llama-index\n",
    "query_wrapper_prompt = PromptTemplate(\"<|USER|>{query_str}<|ASSISTANT|>\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a40a357-8a8f-405d-b355-e0caf23bee3c",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:24<00:00, 12.21s/it]\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "llm = HuggingFaceLLM(\n",
    "    context_window=4096,\n",
    "    max_new_tokens=256,\n",
    "    generate_kwargs={\"temperature\": 0.7, \"do_sample\": False},\n",
    "    system_prompt=system_prompt,\n",
    "    query_wrapper_prompt=query_wrapper_prompt,\n",
    "    tokenizer_name=\"StabilityAI/stablelm-tuned-alpha-3b\",\n",
    "    model_name=\"StabilityAI/stablelm-tuned-alpha-3b\",\n",
    "    device_map=\"auto\",\n",
    "    stopping_ids=[50278, 50279, 50277, 1, 0],\n",
    "    tokenizer_kwargs={\"max_length\": 4096},\n",
    "    # uncomment this if using CUDA to reduce memory usage\n",
    "    # model_kwargs={\"torch_dtype\": torch.float16}\n",
    ")\n",
    "service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad144ee7-96da-4dd6-be00-fd6cf0c78e58",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens\n",
      "> [build_index_from_nodes] Total LLM token usage: 0 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 20729 tokens\n",
      "> [build_index_from_nodes] Total embedding token usage: 20729 tokens\n"
     ]
    }
   ],
   "source": [
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, service_context=service_context\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b6caf93b-6345-4c65-a346-a95b0f1746c4",
   "metadata": {},
   "source": [
    "#### Query Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85466fdf-93f3-4cb1-a5f9-0056a8245a6f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens\n",
      "> [retrieve] Total LLM token usage: 0 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 8 tokens\n",
      "> [retrieve] Total embedding token usage: 8 tokens\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 2126 tokens\n",
      "> [get_response] Total LLM token usage: 2126 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens\n",
      "> [get_response] Total embedding token usage: 0 tokens\n"
     ]
    }
   ],
   "source": [
    "# set Logging to DEBUG for more detailed outputs\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c42733dd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author is a computer scientist who has written several books on programming languages and software development. He worked on the IBM 1401 and wrote a program to calculate pi. He also wrote a program to predict how high a rocket ship would fly. The program was written in Fortran and used a TRS-80 microcomputer. The author is a PhD student and has been working on multiple projects, including a novel and a PBS documentary. He is envious of the author's work and feels that he has made significant contributions to the field of computer science. He is working on multiple projects and is envious of the author's work. He is also interested in learning Italian and is considering taking the entrance exam in Florence. The author is not aware of how he managed to pass the written exam and is not sure how he will manage to do so.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6b282580",
   "metadata": {},
   "source": [
    "#### Query Index - Streaming"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "313544bc",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine(streaming=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1c5ab85-25e4-4460-8b6a-3c119d92ba48",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens\n",
      "> [retrieve] Total LLM token usage: 0 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 8 tokens\n",
      "> [retrieve] Total embedding token usage: 8 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> [get_response] Total LLM token usage: 0 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens\n",
      "> [get_response] Total embedding token usage: 0 tokens\n"
     ]
    }
   ],
   "source": [
    "# set Logging to DEBUG for more detailed outputs\n",
    "response_stream = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8d94603-8e91-451f-8f60-feb0ea0c8c06",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author is a computer scientist who has written several books on programming languages and software development. He worked on the IBM 1401 and wrote a program to calculate pi. He also wrote a program to predict how high a rocket ship would fly. The program was written in Fortran and used a TRS-80 microcomputer. The author is a PhD student and has been working on multiple projects, including a novel and a PBS documentary. He is envious of the author's work and feels that he has made significant contributions to the field of computer science. He is working on multiple projects and is envious of the author's work. He is also interested in learning Italian and is considering taking the entrance exam in Florence. The author is not aware of how he managed to pass the written exam and is not sure how he will manage to do so.<|USER|>"
     ]
    }
   ],
   "source": [
    "# can be slower to start streaming since llama-index often involves many LLM calls\n",
    "response_stream.print_response_stream()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b29749ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "# can also get a normal response object\n",
    "response = response_stream.get_response()\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d14c15b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# can also iterate over the generator yourself\n",
    "generated_text = \"\"\n",
    "for text in response.response_gen:\n",
    "    generated_text += text"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
